Phonetic Transcription of Large Speech Corpora: How to Boost Efficiency Without Affecting Quality

نویسندگان

  • Diana Binnenpoorte
  • Catia Cucchiarini
چکیده

This paper reports on an experiment aimed at improving an automatically generated phonetic transcription of the Spoken Dutch Corpus (CGN). Different techniques are explored to improve an automatically generated phonetic transcription (AGT). The different AGTs are compared to a reference transcription to determine their quality. The results indicate that implementing phonological rules does improve the AGT for all speech styles considered in the experiment. Applying ASR techniques to model phonological rules that are less frequent in continuous speech results in a decrease of substitution errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic phonetic transcription of large speech corpora: a comparative study

This study investigates whether automatic transcription procedures can approximate manual phonetic transcriptions typically delivered with contemporary large speech corpora. We used ten automatic procedures to generate a broad phonetic transcription of well-prepared speech (read-aloud texts) and spontaneous speech (telephone dialogues). The resulting transcriptions were compared to manually ver...

متن کامل

Orthographic and Phonetic Annotation of Very Large Czech Corpora with Quality Assessment

The annotation is generally indivisible part of speech database. In this paper we are presenting common orthographic and phonetic annotation of large Czech databases. Phonetic annotation may be very important and gives more information than pronunciation lexicon with possible pronunciation variants. Moreover, for Czech language phonetic annotation means just small additional effort to standard ...

متن کامل

Validation and improvement of automatic phonetic transcriptions

The ultimate aim of our research is to show that good-quality phonetic transcriptions of large speech corpora can be obtained by employing automatic techniques initially developed for ASR. The experiment presented in this paper has two aims. The first is to show how the quality of an automatic transcription that is easily obtained through lexicon lookup can be measured in a way that is methodol...

متن کامل

Title : Automatic Phonetic Transcription of Large Speech Corpora

Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical ‘canonical’ phonetic transcription from the orthography. In addition, time and money permitting, some speech corpora are provided with a manually verified broad phonetic transcription of at least ...

متن کامل

Automatic generation of phonetic transcriptions for large speech corpora

We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003